Web wide crawl with initial seedlist and crawler configuration from September 2010
8.3M
8.3M
web
eye 8.3M
favorite 0
comment 0
1.1M
1.1M
web
eye 1.1M
favorite 0
comment 0
Data crawled from YouTube.com in 2007 by Internet Archive. These files are not currently accessible.
20M
20M
May 3, 2011
05/11
by
Internet Archive
web
eye 20M
favorite 7
comment 1
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Sun Mar 27 22:10:09 PDT 2011 to Mon Mar 28 05:27:05 PDT 2011.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Web data related to World Wars I and II collected by Internet Archive in an experimental crawl sponsored by National Endowment for the Humanities and JISC. This data is currently not publicly accessible.
Shallow crawl started February 2013 that collects content 1 level deep, including embeds. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
2004 Election crawl performed by Internet Archive. This data is currently not publicly accessible.
11.5M
12M
web
eye 11.5M
favorite 0
comment 0
3.8M
3.8M
web
eye 3.8M
favorite 0
comment 0
1.9M
1.9M
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Mon May 13 17:55:38 PDT 2019 to Mon May 13 13:45:55 PDT 2019.
favoritefavoritefavoritefavoritefavorite ( 3 reviews )
Topic: crawldata
Shallow crawl started November 2012 that collects content 1 level deep, including embeds. This data is currently not publicly accessible.
Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
9.9M
9.9M
web
eye 9.9M
favorite 0
comment 0
This collection contains web crawls performed as the post-inauguration crawl for part of the End of Term Web Archive, a collaborative project that aims to preserve the U.S. federal government web presence at each change of administration. Content includes publicly-accessible government websites hosted on .gov, .mil, and relevant non-.gov domains, as well as government social media materials. The web archiving was performed in the Winter of 2016 and Spring of 2017 to capture websites...
Topics: end of term, federal government, 2016, president, congress
1.6M
1.6M
Feb 22, 2019
02/19
by
Internet Archive
web
eye 1.6M
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl836.us.archive.org:survey from Fri Feb 22 02:17:40 PST 2019 to Thu Feb 21 23:46:50 PST 2019.
Topic: crawldata
2.6M
2.6M
web
eye 2.6M
favorite 0
comment 0
3M
3.0M
web
eye 3M
favorite 0
comment 0
End of term 2008 crawl of .gov domains gathered by University of North Texas . This data is currently not publicly accessible. UNT is a student-focused, public, research university located in Denton, Texas.
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl344.us.archive.org:survey from Sat Sep 30 10:02:40 PDT 2017 to Sat Sep 30 03:34:44 PDT 2017.
Topic: crawldata
939,890
940K
Jan 13, 2019
01/19
by
Internet Archive
web
eye 939,890
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl339.us.archive.org:survey from Sat Jan 12 18:55:42 PST 2019 to Sat Jan 12 19:13:13 PST 2019.
Topic: crawldata
413,977
414K
Dec 10, 2018
12/18
by
Internet Archive
web
eye 413,977
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl836.us.archive.org:survey from Mon Dec 10 09:04:48 PST 2018 to Mon Dec 10 06:28:38 PST 2018.
Topic: crawldata
10.1M
10M
web
eye 10.1M
favorite 0
comment 0
Data crawled by National Endowment for the Humanities and JISC on behalf of Internet Archive from Fri Aug 08 00:17:40 PDT 2008 to Thu Jun 26 05:29:33 PDT 2008
Topic: crawldata
End of Term 2016 Web Archive government web crawls by project partner the Library of Congress.
Topics: end of term, federal government, 2016, president, congress, library of congress, web, data
Wayback robots.txt crawl performed by Internet Archive. This data is currently not publicly accessible.
Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Tue Feb 18 17:01:58 PST 2014 to Tue Feb 18 13:14:06 PST 2014.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Wed Feb 19 09:49:01 PST 2014 to Wed Feb 19 06:07:15 PST 2014.
Topic: crawldata
8.3M
8.3M
web
eye 8.3M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl454.us.archive.org:wide from Wed Feb 19 05:20:19 PST 2014 to Wed Feb 19 01:54:33 PST 2014.
Topic: crawldata
Source: ximm-collections-news-crawls-v3
412,496
412K
Jun 5, 2012
06/12
by
Internet Archive
web
eye 412,496
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-liveweb.us.archive.org from 2012-06-04T16:05:30 UTC to 2012-06-05T00:00:01 UTC.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Tue Oct 9 21:09:32 PDT 2018 to Tue Oct 9 18:23:47 PDT 2018.
Topic: crawldata
75,632
76K
web
eye 75,632
favorite 0
comment 0
1.1M
1.1M
web
eye 1.1M
favorite 0
comment 0
351,456
351K
web
eye 351,456
favorite 0
comment 0
7.2M
7.2M
May 3, 2011
05/11
by
Internet Archive
web
eye 7.2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Mon Mar 28 12:43:47 PDT 2011 to Mon Mar 28 16:56:17 PDT 2011.
Topic: crawldata
This collection contains all collaborative crawl data contributed by University of North Texas (UNT).
Topics: UNT, web, texas, eot
Data related to the 2004 Indian Ocean earthquake and tsunami collected by Internet Archive. This data is currently not publicly accessible.
Top 150 Alexa sites crawl performed by Internet Archive. This data is currently not publicly accessible.
5.4M
5.4M
web
eye 5.4M
favorite 0
comment 0
361,616
362K
Dec 13, 2012
12/12
by
Internet Archive
web
eye 361,616
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-11-26T02:49:24 UTC to 2012-11-26T11:32:59 UTC.
Topic: crawldata
2.5M
2.5M
Jul 15, 2013
07/13
by
Internet Archive
web
eye 2.5M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live3.us.archive.org from 2013-07-14T21:39:49 UTC to 2013-07-15T19:15:09 UTC.
Topic: crawldata
387,745
388K
Dec 13, 2012
12/12
by
Internet Archive
web
eye 387,745
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-11-26T10:28:21 UTC to 2012-11-26T21:41:23 UTC.
Topic: crawldata
7.1M
7.1M
web
eye 7.1M
favorite 0
comment 1
Data crawled by Internet Archive on behalf of Internet Archive from Fri Nov 01 06:23:33 PDT 2002 to Tue Nov 19 23:24:07 PDT 2002
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
1.5M
1.5M
web
eye 1.5M
favorite 0
comment 0
Collaborative closure crawl of British government sites performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : GOV.UK is a United Kingdom public sector information website, created by the Government Digital Service to provide a single point of access to HM Government services.
3.1M
3.1M
web
eye 3.1M
favorite 0
comment 0
942,398
942K
web
eye 942,398
favorite 0
comment 0
1.2M
1.2M
web
eye 1.2M
favorite 0
comment 0
Data crawled by Internet Archive on behalf of Internet Archive from Mon Oct 26 17:21:45 PDT 2009 to Mon Oct 26 17:58:41 PDT 2009
Topic: crawldata
1.3M
1.3M
web
eye 1.3M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Jun 29 15:44:55 PDT 2016 to Wed Jun 29 10:43:30 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Wed Jun 29 17:31:53 PDT 2016 to Wed Jun 29 15:11:43 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl835.us.archive.org:wide from Thu Jun 30 10:52:08 PDT 2016 to Thu Jun 30 06:07:26 PDT 2016.
Topic: crawldata
5M
5.0M
web
eye 5M
favorite 0
comment 0
Data crawled by Internet Archive on behalf of Internet Archive from Sat Sep 18 12:46:38 PDT 2004 to Thu May 05 09:34:36 PDT 2005
Topic: crawldata
858,679
859K
web
eye 858,679
favorite 0
comment 0
6.5M
6.5M
web
eye 6.5M
favorite 0
comment 0
773,775
774K
web
eye 773,775
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Apr 7 15:00:19 PDT 2016 to Thu Apr 7 09:30:57 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl825.us.archive.org:wide from Thu Apr 7 11:34:12 PDT 2016 to Thu Apr 7 06:58:33 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Apr 7 12:21:53 PDT 2016 to Thu Apr 7 07:27:22 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Thu Apr 7 17:22:05 PDT 2016 to Thu Apr 7 13:08:49 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Thu Apr 7 12:24:26 PDT 2016 to Thu Apr 7 08:51:27 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Thu Apr 7 12:21:12 PDT 2016 to Thu Apr 7 07:19:38 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Thu Apr 7 12:12:12 PDT 2016 to Thu Apr 7 07:26:33 PDT 2016.
Topic: crawldata
5.8M
5.8M
May 4, 2011
05/11
by
Internet Archive
web
eye 5.8M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Tue Mar 29 00:12:24 PDT 2011 to Tue Mar 29 07:24:41 PDT 2011.
Topic: crawldata
4.8M
4.8M
web
eye 4.8M
favorite 0
comment 0
Data crawled by Internet Archive on behalf of Internet Archive from Fri Sep 26 12:13:03 PDT 2003 to Wed Dec 10 23:25:51 PDT 2003
Topic: crawldata
3.1M
3.1M
web
eye 3.1M
favorite 0
comment 0
1.8M
1.8M
Aug 13, 2013
08/13
by
Internet Archive
web
eye 1.8M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live4.us.archive.org from 2013-08-11T22:27:04 UTC to 2013-08-13T04:44:55 UTC.
Topic: crawldata
83,534
84K
web
eye 83,534
favorite 0
comment 0
1.6M
1.6M
Feb 1, 2012
02/12
by
Internet Archive
web
eye 1.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl431.us.archive.org:widewebcap from Tue Jan 17 22:09:27 UTC 2012 to Wed Feb 1 11:19:58 UTC 2012.
Topic: crawldata
457,766
458K
web
eye 457,766
favorite 0
comment 0
189,834
190K
web
eye 189,834
favorite 0
comment 0